An Overview of Document Mining Technology

نویسندگان

  • Mark Dixon
  • Gerald DeJong
چکیده

Living through the Information Revolution is becoming a diicult task-humans were not designed to process massive quantities of information. The computer rst found it's use in speeding our number crunching, performing a large number of calculations blindingly fast. We are now beginning to turn to computers to solve another human inadequacy, mining through our masses of information to nd items of interest. Document mining has many uses in our information era, looking for patterns in commonly available texts such as news-feeds. How many terrorist attacks were there in 1995? Is there a strong relationship between the IRA and car bombs? Do frequent changes of company management lead to better proots? Document mining has the potential to identify patterns such as these hidden inside vast collections of text data, possibly giving companies that competitive edge they need to survive.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Ancient Gold Mining Activities in India - An Overview

Gold was obtained through washing or panning of the river sands during initial periods of civilisation. With the advent of knowledge of metallurgical processing of ores it was recovered through mining of in-situ quartz reefs, and then from auriferous sulphide ores. The metal mining activities are evidenced in the form of large number of ‘ancient metal mines’ or ‘old workings’ and ‘placer mining...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Fault orientation modeling of Sonda- Jherruck coalfield, Pakistan

Faults are the most critical tectonic factors in geological structures, which have major economic impacts on mining economics. Thus it is necessary to understand faults in order to identify the actual risks and complications associated with mining. In the preliminary investigation of the Sonda-Jherruck coalfield, 3D geological modeling was not performed. The purpose of this work was to perform ...

متن کامل

Multiagent Data Mining Systems: An Overview

Data mining technology has emerged as a means for identifying patterns and trends from large quantities of data. The Data Mining technology normally adopts data integration method to generate Data warehouse, on which to gather all data into a central site, and then run an algorithm against that data to extract the useful Module Prediction and knowledge evaluation. However, a single data-mining ...

متن کامل

Overview of the NTCIR-9 INTENT Task

This is an overview of the NTCIR-9 INTENT task, which comprises the Subtopic Mining and the Document Ranking subtasks. The INTENT task attracted participating teams from seven different countries/regions – 16 teams for Subtopic Mining and 8 teams for Document Ranking. The Subtopic Mining subtask received 42 Chinese runs and 14 Japanese runs; the Document Ranking subtask received 24 Chinese runs...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997